Matrix Factorization as Search
نویسندگان
چکیده
Simplex Volume Maximization (SiVM) exploits distance geometry for e ciently factorizing gigantic matrices. It was proven successful in game, social media, and plant mining. Here, we review the distance geometry approach and argue that it generally suggests to factorize gigantic matrices using search-based instead of optimization techniques. 1 Interpretable Matrix Factorization Many modern data sets are available in form of a real-valued m × n matrix V of rank r ≤ min(m,n). The columns v1, . . . ,vn of such a data matrix encode information about n objects each of which is characterized by m features. Typical examples of objects include text documents, digital images, genomes, stocks, or social groups. Examples of corresponding features are measurements such as term frequency counts, intensity gradient magnitudes, or incidence relations among the nodes of a graph. In most modern settings, the dimensions of the data matrix are large so that it is useful to determine a compressed representation that may be easier to analyze and interpret in light of domain-speci c knowledge. Formally, compressing a data matrix V ∈ Rm×n can be cast as a matrix factorization (MF) task. The idea is to determine factor matrices W ∈ Rm×k and H ∈ Rk×n whose product is a low-rank approximation of V. Formally, this amounts to a minimization problem minW, H ∥∥V−WH∥∥2 where ‖·‖ denotes a suitable matrix norm, and one typically assumes k r. A common way of obtaining a low-rank approximation stems from truncating the singular value decomposition (SVD) where V = WSU = WH. The SVD is popular for it can be solved analytically and has signi cant statistical properties. The column vectors wi of W are orthogonal basis vectors that coincide with the directions of largest variance in the data. Although there are many successful applications of the SVD, for instance in information retrieval, it has been criticized because the wi may lack interpretability with respect to the eld from which the data are drawn [6]. For example, the wi may point in the direction of negative orthants even though the data itself is strictly non-negative. Nevertheless, data analysts are often tempted to reify, i.e., to assign a physical ? The authors would like to thank the anonymous reviewers for their comments. This work was partly supported by the Fraunhofer ATTRACT fellowship STREAM. 2 K. Kersting et al. meaning or interpretation to large singular components. In most cases, however, this is not valid. Even if rei cation is justi ed, the interpretative claim cannot arise from mathematics, but must be based on an intimate knowledge of the application domain. The most common way of compressing a data matrix such that the resulting basis vectors are interpretable and faithful to the data at hand is to impose additional constraints on the matrices W and H. An example is non-negative MF (NMF), which imposes the constraint that entries of W and H are non-negative. Another example of a constrained MF method is archetypal analysis (AA) as introduced by [3]. It considers the NMF problem where W ∈ Rn×k and H ∈ Rk×n are additionally required to be column stochastic matrices, i.e., they are to be non-negative and each of their columns is to sum to 1. AA therefore represents every column vector in V as a convex combination of convex combinations of a subset of the columns of V. Such constrained MF problems are traditionally solved analytically since they constitute quadratic optimization problems. Although they are convex in either W or H, they are however not convex in WH so that we su ers from many local minima. Moreover, their memory and runtime requirements scale quadratically with the number n of data and therefore cannot easily cope with modern large-scale problems. A recent attempt to circumvent these problems is the CUR decomposition [6]. It aims at minimizing ‖V −CUR‖ where the columns of C are selected from the columns of V, the rows of R are selected from the rows of V, and U contains scaling coe cients. Similar to AA, the factorization is expressed in terms of actual data elements and hence is readily interpretable. However, in contrast to AA, the selection is not determined analytically but by means of importance sampling from the data at hand. While this reduces memory and runtime requirements, it still requires a complete view of the data. Therefore, neither of the methods discussed so far easily applies to growing dataset that nowadays become increasingly common. 2 Matrix Factorization as Search MF by means of column subset selection allows one to cast MF as a volume maximization problem rather than as norm minimization [2]. It can be shown that a subset W of k columns of V yields a better factorization than any other subset of size k, if the volume of the parallelepiped spanned by the columns of W exceeds the volumes spanned by the other selections. Following this line, we have recently proposed a linear time approximation for maximising the volume of the simplex ∆W whose vertices correspond to the selected columns [9]. Intuitively, we aim at approximating the data by means of convex combinations of selected vectors W ⊂ V. That is, we aim at compressing the data such that vi ≈ ∑k j=1 wj hji where hi 0 ∧ 1hi = 1 ∀i . Then, data vectors situated on the inside of the simplex ∆W can be reconstructed perfectly, i.e., ‖vi−Whi‖ = 0. Accordingly, the larger the volume of ∆W, the better the corresponding low-rank approximation of the entire data set will be. Such volume maximization approaches are more e cient than methods based on minimizing Matrix Factorization as Search 3 a matrix norm. Whereas the latter requires computing both matrices W and H in every iteration, volume maximization methods compute the coe cient matrix H only after the matrix of basis vectors W has been determined. Moreover, whereas evaluating ‖V −WH‖ is of complexity O(n) for n data points vi, evaluating Vol(W) or Vol(∆W) requires O(k) for the k n currently selected columns. Moreover, transferring volume maximization from parallelepipeds to simplices has the added bene t that it allows for the use of distance geometry. Given the lengths di,j of the edges between the k vertices of a (k − 1)-simplex ∆W, its volume Vol∆W can be computed based on this distance information only (*): Vol∆W = √ −1k 2k−1 ( (k−1)! )2 det(A) where det(A) is the Cayley-Menger determinant [1]. And, it naturally leads to search-based MF approaches. A simple greedy bestrst search algorithm for MF that immediately follows from what has been discussed so far works as follows. Given a data matrix V, we determine an initial selection X2 = {a, b} where va and vb are the two columns that are maximally far apart. That is, we initialize with the largest possible 1simplex. Then, we consider every possible extension of this simplex by another vertex and apply (*) to compute the corresponding volume Vol′. The extended simplex that yields the largest volume is considered for further expansion. This process continues, until k columns have been selected from V. Lower bounding (*) by assuming that all selected vertices are equidistant turns this greedy bestrst into the linear time MF approach called Simplex Volume Maximization (SiVM) [9]. SiVM was proven to be successful for the fast and interpretable analysis of massive game and twitter data [7], of large, sparse graphs [8] as well as when combined with statistical learning techniques of drought stress of plants [4, 5]. However, we can explore and exploit the link established between MF and search even further. For instance, a greedy stochastic hill climbing algorithm (sSiVM) starts with a random initial selection of k columns of V and iteratively improves on it. In each iteration, a new candidate column is chosen at random and tested against the current selection: for each of the currently selected columns, we verify if replacing it by the new candidate would increase the simplex volume according to (*). The column whose replacement results in the largest gain is replaced. An apparent bene t of sSiVM is that it does not require batch processing or knowledge of the entire data matrix. It allows for timely data matrix compression even if the data arrive one at a time. Since it consumes only O(k) memory, it represents a truly low-cost approach to MF. In an ongoing project on social media usage, we are running a script that constantly downloads user annotated images from the Internet. We are thus in need of a method that allows for compressing this huge collection of data in an online fashion. sSiVM appears to provide a solution. To illustrate this, we considered a standard data matrix representing Internet images collected by [10]. This publicly available data has the images re-scaled to a resolution of 32 × 32 pixels in 3 color channels and also provides an abstract representation using 384-dimensional GIST feature vectors. Up to when writing the present paper, sSiVM processed a stream of about 1,600,000 images (randomly selected). This 4 K. Kersting et al.
منابع مشابه
A new approach for building recommender system using non negative matrix factorization method
Nonnegative Matrix Factorization is a new approach to reduce data dimensions. In this method, by applying the nonnegativity of the matrix data, the matrix is decomposed into components that are more interrelated and divide the data into sections where the data in these sections have a specific relationship. In this paper, we use the nonnegative matrix factorization to decompose the user ratin...
متن کاملA Projected Alternating Least square Approach for Computation of Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a common method in data mining that have been used in different applications as a dimension reduction, classification or clustering method. Methods in alternating least square (ALS) approach usually used to solve this non-convex minimization problem. At each step of ALS algorithms two convex least square problems should be solved, which causes high com...
متن کاملA Modified Digital Image Watermarking Scheme Based on Nonnegative Matrix Factorization
This paper presents a modified digital image watermarking method based on nonnegative matrix factorization. Firstly, host image is factorized to the product of three nonnegative matrices. Then, the centric matrix is transferred to discrete cosine transform domain. Watermark is embedded in low frequency band of this matrix and next, the reverse of the transform is computed. Finally, watermarked ...
متن کاملRiordan group approaches in matrix factorizations
In this paper, we consider an arbitrary binary polynomial sequence {A_n} and then give a lower triangular matrix representation of this sequence. As main result, we obtain a factorization of the innite generalized Pascal matrix in terms of this new matrix, using a Riordan group approach. Further some interesting results and applications are derived.
متن کاملImage Compression Method Based on QR-Wavelet Transformation
In this paper, a procedure is reported that discuss how linear algebra can be used in image compression. The basic idea is that each image can be represented as a matrix. We apply linear algebra (QR factorization and wavelet transformation algorithms) on this matrix and get a reduced matrix out such that the image corresponding to this reduced matrix requires much less storage space than th...
متن کاملA Modified Digital Image Watermarking Scheme Based on Nonnegative Matrix Factorization
This paper presents a modified digital image watermarking method based on nonnegative matrix factorization. Firstly, host image is factorized to the product of three nonnegative matrices. Then, the centric matrix is transferred to discrete cosine transform domain. Watermark is embedded in low frequency band of this matrix and next, the reverse of the transform is computed. Finally, watermarked ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012